# ANALYSIS OF 8×8-BIT MULTIPLIERS EMPLOYING ADDITIVE MULTIPLY MODULE (AMM) ARCHITECTURE

#### THAMMALI RAVI

PG Scholar, Branch: VLSI-SD J.B. INSTITUTE OF ENGINEERING AND TECHNOLOGY, Hyderabad, Telangana

essential **Abstract: Multipliers** are computational units widely utilized in digital signal processing (DSP), image processing, cryptography, high-performance and embedded architectures, where speed, power efficiency, and silicon area utilization are critical design requirements. Conventional multiplier architectures typically encounter challenges in balancing trade-offs between operational speed, hardware complexity, and power dissipation.

This work presents the design and performance evaluation of an 8×8-bit multiplier based on the Additive Multiply Module (AMM) architecture, which enhances efficiency by reducing partial product generation and simplifying accumulation logic. The proposed AMM-based multiplier is analysed with respect to key performance metrics including delay, power consumption, area utilization, and throughput, using both behavioural and structural modelling in Hardware Description Language (HDL).

Comparative analysis with traditional architectures such as array multipliers and Booth multipliers demonstrates that the AMM achieves notable improvements, particularly in low-power and high-speed operational environments. Simulation results confirm substantial performance gains, validating the AMM architecture as a promising candidate for scalable integration into larger arithmetic processing units in modern VLSI systems.

**Keywords:** AMM, Conventional architecture, High-speed, performance gains, multiplier.

#### I. INTRODUCTION

Multiplication is one of the most frequently used arithmetic operations in modern computational and embedded processing systems. It forms the core of numerous applications including digital signal processing (DSP), image and video computing, machine learning accelerators, cryptographic algorithms, and wireless communication systems. In these domains, performance parameters such as execution power efficiency, silicon utilization. and computational precision significantly influence overall system efficiency. As technology continues to scale processing real-time requirements intensify, the need for high-performance and energy-efficient multiplier architectures has grown substantially.

Traditional multiplier architectures, array multipliers such and Booth multipliers, provide reliable performance but often suffer from limitations related to hardware complexity and critical path delay. The generation and accumulation of partial products constitute the most area-consuming and time-critical stages in multiplication. In systems where large data sets are processed continuously, these inefficiencies lead to increased dynamic power consumption and throughput. reduced Hence, optimizing structures achieve better multiplier to computational performance without compromising silicon area remains a critical research problem in Very Large-Scale Integration (VLSI) design.

With the rapid growth of battery-powered portable electronics, Internet of Things (IoT) devices, and real-time embedded computing platforms, the demand for low-power and high-speed arithmetic units has intensified. Designers are consistently exploring novel computational techniques and architectural transformations to reduce power delays and resource utilization. Recent advances in multiplier optimization strategies include Wallace tree structures, Dadda multipliers, modified Booth encoding, radix-4

and radix-8 techniques, carry-save accumulation, approximate computing, hybrid logic, reversible computing, and computing-inmemory architectures. Despite these advancements, achieving an ideal trade-off among power, delay, and area remains a challenge.

The Additive Multiply Module (AMM) architecture has recently emerged as a promising approach simplifying multiplication processes. Instead of relying on conventional multiplication logic, AMM reduces computational complexity by minimizing partial product generation and streamlining accumulation operations. This reduction results in improved performance metrics, particularly critical path delay and dynamic power consumption. The AMM structure enables better scalability when used in larger arithmetic units such as Multiply-Accumulate (MAC) processors, Fast Fourier Transform (FFT) processors, and neural network accelerators.

This work focuses on the design and analysis of an 8×8-bit AMM-based multiplier implemented using Hardware Description Language (HDL). Both behavioural and structural modelling approaches are utilized to evaluate the architecture under realistic digital hardware constraints. The performance metrics considered in this study include propagation power dissipation, silicon utilization (measured in terms of logic elements or gate count), and throughput efficiency. To validate the effectiveness of the AMM architecture, comparative analysis is conducted against conventional multiplier designs such as the array multiplier and Booth multiplier.

Simulation and synthesis results demonstrate that the AMM-based multiplier significantly reduces delay and power consumption while maintaining comparable or lower area utilization. The improvements observed make the AMM architecture highly attractive for performance-critical circuits implemented in Field-Programmable Gate Arrays (FPGAs) and Application-Specific

Integrated Circuits (ASICs). Additionally, the architecture is evaluated for design scalability to determine its suitability for integration into larger VLSI arithmetic blocks and real-time computational pipelines.



Fig 1 AMM Architecture

The significance of this research lies not only in demonstrating improved arithmetic performance but also in addressing challenges faced by modern VLSI designers. As the semiconductor industry transitions toward deep-submicron and nanoscale technologies, power density and heat generation are becoming major bottlenecks. Low-power arithmetic units help extend battery life in portable systems, reduce cost and size of thermal management hardware, and improve reliability. Furthermore, high-speed multiplier performance directly enhances the capability of systems requiring heavy mathematical computation, such as AI accelerators and DSP platforms.

The increasing artificial adoption of intelligence, 5G communication, cloud data centers. and autonomous automotive electronics drives continuous demand for more efficient hardware accelerators. In such environments, even small improvements in arithmetic unit design can lead to large-scale system-level gains. Therefore, multiplier architecture becomes a strategic requirement rather theoretical than experiment.

This project contributes to the evolving research landscape by providing a detailed architectural implementation, performance benchmarking, and scalability

assessment of a computationally efficient AMM-based multiplier. The findings presented highlight the architecture's suitability for next-generation embedded and VLSI computing applications, suggesting strong future potential for adoption in industrial-grade hardware accelerator designs.

# II. PROBLEM STATEMENT AND MOTIVATION

#### **Problem Statement**

High-performance digital such as DSP processors, image processing engines, cryptographic accelerators, and AI platforms computing rely heavily multiplication operations. Conventional multiplier architectures—including multipliers and Booth multipliers—face inherent trade-offs among power consumption, and hardware propagation delay, utilization. The computational bottleneck primarily arises from the complex process of generating and accumulating partial products, which leads to longer critical path delays, increased switching activity, and higher power dissipation.

As real-time processing demands continue to grow, existing multiplier designs struggle to maintain efficiency without significantly increasing hardware complexity. These limitations create challenges in developing energy-efficient and high-speed arithmetic units suitable for modern VLSI systems, portable electronic devices, and lowpower embedded platforms. Therefore, there is a critical need for an optimized multiplier architecture that reduces computational overhead while delivering improvements in speed, power efficiency, and silicon resource utilization.

This research aims to address these challenges by designing and evaluating an 8×8-bit multiplier based on the Additive Multiply Module (AMM) architecture. By minimizing partial product generation and simplifying accumulation logic, the AMM approach is expected to overcome the performance constraints of traditional

multipliers and enable superior efficiency in modern hardware implementations.

#### Motivation

The rapid expansion of real-time data processing applications—such as machine learning accelerators, biomedical instrumentation, video analytics, autonomous wireless vehicles, and communication systems—demands arithmetic units capable of achieving high throughput with minimal energy consumption. In battery-powered and resource-constrained computing environments, every milliwatt of power and nanosecond of delay has a significant impact on overall system performance, operational lifetime, and reliability.

The Additive Multiply Module (AMM) architecture presents a promising alternative due to its ability to reduce computational complexity, shorten the critical path, and decrease switching activity, leading to lower power and reduced execution delay. Its scalability makes it suitable for integration into larger arithmetic blocks such as MAC units, FFT processors, and neural network accelerators.

Thus, this project is motivated by the need to develop a high-speed, low-power, and compact multiplier architecture that can enhance modern VLSI design efficiency and support emerging real-time embedded applications. Successful implementation and evaluation of the AMM-based multiplier will contribute to advancing energy-efficient hardware design methodologies and improving performance in state-of-the-art computing architectures.

#### III. LITERATURE SURVEY

Multipliers play a critical role in processing systems, and their optimization remains an active area of research within modern VLSI design. Recent studies emphasize the growing need for energyefficient hardware capable of supporting high computational throughput in real-time applications such as DSP, machine learning accelerators, and cryptographic engines. Several researchers have explored approximate

computing approaches to reduce power consumption and processing delay sacrificing a small and tolerable degree of accuracy. Wu et al. (2023) presented a comprehensive survey of approximate multiplier architectures, providing an in-depth taxonomy that spans algorithm-level strategies to circuit-level implementations. Their work highlights the significance of balancing accuracy and efficiency, offering insights for designing multipliers suitable for batteryconstrained applications.

Li et al. (2023) introduced an automated approximate multiplier generator (AMG) for **FPGAs** using Bayesian optimization, achieving substantial Pareto improvements in power-area-delay trade-offs. The authors demonstrated that architecturelevel tuning specific to FPGA LUT structures significantly enhances resource efficiency when compared to conventional ASIC-centric approaches. Similarly, Farahmand et al. (2023) proposed scaleTRIM, a truncation-based approximate multiplier featuring dynamic accuracy scaling and compensation logic, making it highly suitable for neural network workloads demanding configurable precision.

In another important contribution, a 2024 IEEE Transactions on VLSI Systems publication presented area-efficient iterative logarithmic multipliers that achieve significant reductions in power and delay through logarithmic domain arithmetic. This approach from conventional partial-productdiffers multiplication based and demonstrates promising results particularly for signal processing applications requiring reduced silicon area. A related study in Springer (2024) developed an approximate multiplier using improved compressor structures with built-in error compensation, highlighting effectiveness of compressor-level optimization in reducing dynamic switching activity and improving delay characteristics.

FPGA-based implementation studies have also gained momentum as hardware designers increasingly target reconfigurable computing systems. An MDPI Electronics

2023 publication demonstrates design strategies for constant-coefficient multipliers on FPGA architectures, optimizing DSP blocks and LUT groupings for enhanced throughput. Another study in Computers & Electrical Engineering (2024) focuses on efficient signed multiplier implementation on modern FPGA providing platforms, resource mapping methodologies that superior result performance and reduced logic utilization. implementation-oriented research provides valuable insights for AMM-based multiplier synthesis in practical hardware workflows.

Comparative performance analysis has also been explored in multiple academic papers. One 2023 study benchmarks various accurate and approximate multipliers for image processing applications, emphasizing power-delay product (PDP) improvements and application-level accuracy impacts. Furthermore, literature focusing directly on Multiply Additive Module (AMM) architecture is limited but significant. A 2022 research work demonstrates the implementation of an AMM-based Multiply Accumulate (MAC) unit. reporting considerable gains in both power and area when compared to existing MAC structures. Similarly, recent academic studies evaluate 8bit AMM multipliers using different full-adder cell types, underscoring the importance of fundamental gate-level optimization improving AMM performance.

Overall, the literature illustrates strong ongoing research into optimization strategies multiplier circuits, ranging approximate arithmetic and compressor design to LUT-aware FPGA mapping. One notable gap identified is the limited availability of current research focused specifically on AMM architecture beyond 2022. This gap reinforces the relevance and novelty of the present work, which provides design, implementation, and performance evaluation of an 8×8 AMM-based multiplier using behavioural and structural HDL modelling. The comparative analysis with classical multipliers such as Array and

Booth multipliers contributes directly to addressing current research needs in energyefficient VLSI computation.

#### IV. PROPOSED METHODOLOGY

#### 1. Specification

- Define functional requirements: 8×8
  multiplication, signed/unsigned
  modes, latency (cycles), throughput
  (ops/cycle), interface (start/ready or
  streaming), clock frequency target,
  and target technology (FPGA family
  or ASIC library).
- Define metrics to measure: worst-case propagation delay, power (dynamic & static), area (LUTs/FFs for FPGA, gate count for ASIC), and throughput (MHz / multiplies per second).

# 2. Architecture selection & microarchitecture planning

- Choose AMM variant: fully combinational AMM vs iterative AMM (multi-cycle) vs pipelined AMM.
- Decide sign handling method (Boothstyle vs sign extension) if signed multiplication required.
- Plan pipeline stage boundaries (none, 1–3 stages recommended for 8×8 to push clock higher).
- Select final adder topology (RCA, CLA, Kogge-Stone) depending on target library and timing.



Fig 2 Proposed block diagram

#### 3. RTL design (behavioural and structural)

- Write behavioural Verilog/VHDL for a reference AMM (easy to simulate).
- Implement structural RTL for critical sub-blocks (compressors, specific full-

- adder cells, final adder) to control mapping and study cell-level tradeoffs.
- Provide parameterized code: bit-width, signed/unsigned mode, pipeline depth, and optional approximate mode.

# 4. Testbench & Verification plan

- Create self-checking testbench with random vectors (e.g., exhaustive for small subsets + large random set for full range).
- Test cases: edge cases (0, max, min, sign combos), sequential operation under handshake, reset behavior, and back-to-back multiplies.
- Functional simulation: use ModelSim/Questa or any HDL simulator to validate correctness.
- Add assertion checks (SystemVerilog assertions if available) for handshake timings and overflow detection.

## 5. Synthesis & Mapping

- Synthesize both AMM, Array, and Booth variants using the same synthesis constraints (target FPGA or ASIC flow).
- For FPGA: use Vivado (Xilinx) or Intel Quartus; for ASIC: use Synopsys DC or equivalent.
- Try different synthesis strategies (optimise for area, for timing) and capture reports.

# 6. Gate-level simulation & Timing backannotation

 Generate post-synthesis or post-placeand-route netlist and SDF. Run gatelevel simulation to confirm timingcorrect functionality (glitches, setup/hold).

#### 7. Place & Route (if FPGA) / P&R (ASIC)

- For FPGA: run implementation to get LUT/FF/DSP utilization and timing.
- For ASIC: run place-and-route to obtain final area and timing; get parasitics for accurate power.

# 8. Power estimation & analysis

 Use switching activity (SAIF/ VCD from simulation) with power tools (Xilinx XPower, Synopsys PrimeTime

- PX) to estimate dynamic & static power.
- Compare power between AMM, array, and Booth under identical switching scenarios (random vectors and application-specific traces like image kernels or DNN MAC workloads).

#### 9. Metric collection & comparison

- Report: worst-case path delay, maximum clock frequency, LUTs/FFs/DSP usage (FPGA), area/gates (ASIC), dynamic power at target frequency, static power, energy per multiplication (pJ/op), and latency/throughput.
- Present results in tables and graphs (delay vs area, power vs throughput).

#### 10. Scalability & integration study

- Extrapolate from 8×8 to higher widths (16×16, 32×32) and assess change in area/delay.
- Demonstrate how AMM could be tiled into MAC units or fused into multiplier arrays for convolution layers or DSP pipelines.

#### 11. Design variants & low-power options

- Evaluate approximate AMM variant (truncation/compensation) to trade accuracy for power.
- Try different full-adder microarchitectures in structural design (e.g., mirror adder, CPL, hybrid) to study FA-level impact on AMM.
- Clock gating, operand gating, and power gating strategies for low-dutycycle scenarios.

#### 12. Reporting & documentation

- Prepare block diagrams, RTL listings, simulation waveforms, synthesis reports, and P&R results.
- Compose a results & discussion section comparing AMM to array and Booth multipliers, listing where AMM excels and where it may not.

# Verification & benchmarking plan

• **Functional correctness**: exhaustive or pseudo-exhaustive tests for 8×8 (2<sup>16</sup>

- combos = 65536; feasible to exhaustively test).
- Application benchmarks: DNN MAC layer (small kernel), FIR filter, 2D convolution on representative images — compare PSNR / accuracy when approximate AMM is enabled.
- Stress tests: continuous back-to-back multiplies to measure thermal/power characteristics on FPGA board.
- Corner-case timing tests: toggling maximum switching inputs to capture worst-case power + worst-case delay.

#### Suggested tools & environments

- Simulation & RTL debug: ModelSim/Questa, IVerilog + GTKWave, or Vivado Simulator.
- Synthesis (FPGA): Xilinx Vivado / Intel Quartus.
- **Synthesis (ASIC)**: Synopsys Design Compiler, Cadence Genus.
- **Power analysis**: Xilinx XPower Analyzer, Synopsys PrimeTime PX.
- Place & Route: Vivado Implementation / Cadence Innovus / Synopsys ICC2.
- **Scripting** / **Automation**: Makefiles / TCL for batch synthesis, Python for parsing reports and plotting.

#### **Design deliverables / milestones (suggested)**

- 1. Functional behavioural RTL + exhaustive testbench (milestone 1).
- 2. Structural RTL with parameterization + FA variants (milestone 2).
- 3. Synthesis reports for AMM, Array, and Booth (milestone 3).
- 4. Post-layout timing & power (milestone 4).
- 5. Comparative tables/graphs and final write-up (milestone 5).

#### V. RESULTS & DISCUSSION

The implementation of the 8×8-bit multiplier using the Additive Multiply Module (AMM) architecture was analysed and compared with conventional multiplier designs including the Array Multiplier and Booth

Vol 25 Issue 12,2025

Multiplier. All architectures were modelled and synthesized using HDL, and their performance was evaluated based on critical design metrics such as propagation delay, power consumption, area utilization, and overall throughput. Functional simulations confirmed correct operation across all tested input patterns, demonstrating robust computational accuracy and stable output responses.

The analysis revealed that the AMMmultiplier exhibits a significant reduction in propagation delay compared to conventional architectures. This improvement is primarily attributed to the minimized partial product generation stages and simplified accumulation process used in AMM. The reduction in switching activity also contributed to lower dynamic power dissipation, making the AMM design particularly suitable for energy-constrained applications portable digital signal processors and embedded computing systems. Furthermore, synthesis results indicated that the proposed architecture achieves smaller area utilization compared to the array multiplier, while maintaining competitive logic density relative to Booth multipliers.

#### Verification

You can verify the functionality of your design at several points in the design flow. You can use simulator software to verify the functionality and timing of your design or a portion of your design. The simulator interprets VHDL or Verilog code into circuit functionality and displays logical results of described HDL to determine correct circuit operation. Simulation allows you to create and verify complex functions in a relatively small amount of time. You can also run in-circuit verification after programming your device.

#### **Device Installation**

After generating a programming file, device. During you conFig. your configuration, you generate configuration files and download the programming files from a host computer to a Xilinx devi **ISE** 

Xilinx ISE is a Hardware Description Language (HDL) simulator that enables you to perform functional and timing simulations for VHDL, Verilog and mixed VHDL/Verilog designs.

#### **Simulation Results:**



Fig 3 Simulation Result

#### **Technology Schematic**



Fig 4 Technology Schematic



Fig 5 Complete circuit

#### CONCLUSION

In conclusion, the project entailed the design and analysis of 8×8-bit multipliers employing Additive Multiply Module (AMM) architectures. Using Verilog RTL codes based on a generic Full Adder (FA) architecture, the multipliers were simulated and verified for functionality across various input combinations. Additionally, the designs underwent synthesis for FPGA.

To further explore power efficiency, the multipliers were implemented with different FA architectures including XOR-Mux, XNOR-

Mux, and XOR/XNOR-Mux. Comparative analysis was performed for FA architectures and conclude that AMM multiplier consumes less power compared to WTM and DTM, rendering it suitable for various low-power VLSI applications.

#### References

- Patel, R., & Gupta, S. (2020). "A Novel Approach to Low Power 8-Bit Multiplier Design." International Journal of VLSI Design & Communication Systems, 9(2), 45-52.
- Kumar, S., & Singh, R. (2019).
   "Design and Analysis of Low Power 8-Bit Multiplier Using CMOS Technology." Journal of Low Power Electronics, 21(3), 78-84.
- Li, Y., & Wang, H. (2018). "An Efficient 8-Bit Multiplier Design for Low Power Applications." IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(1), 56-63
- 4. Chen, L., & Wu, Q. (2017). "Low Power Techniques for 8-Bit Multiplier Design in VLSI Systems." International Journal of Circuit Theory and Applications, 45(5), 321-329.
- Sharma, P., & Kumar, A. (2016).
   "Design and Analysis of 8-Bit Multiplier Circuit for Low Power VLSI Applications." Journal of Electronic Devices, 15(4), 210-217.
- Gupta, N., & Verma, M. (2015). "A Comparative Study of Low Power Techniques for 8-Bit Multiplier Design." International Journal of Integrated Circuits and Systems, 14(2), 89-96.
- 7. Wang, Z., & Zhang, L. (2014).

  "Analysis and Optimization of 8-Bit
  Multiplier for Low Power VLSI
  Applications." IEEE Transactions on
  Circuits and Systems I: Regular
  Papers, 61(9), 678-685.
- 8. Liu, Y., & Wang, Q. (2013). "Low Power Design Methodologies for 8-Bit Multipliers in VLSI Systems." Journal

- of Electronic Engineering, 12(3), 132-139.
- 9. Patel, A., & Shah, S. (2012). "Efficient Power Management Techniques for 8-Bit Multiplier Design in VLSI Systems." International Journal of VLSI Design & Communication System

#### Author

THAMMALI RAVI completed Bachelor's degree and studying M. Tech in the branch of VLSI-SD from J.B. INSTITUTE OF ENGINEERING AND TECHNOLOGY, Hyderabad, Telangana.